1/8 



r 



100 



102 



INPUT DEVICES 




106 



L , 




1 , j 



OCR ENGINE 



< — ► 



PARSER 



120 



X 



BEST KEYWORD IDENTIFIER 



126 







122 



124 



KEYWORD TRANSLATOR I 
1 , i 



128- 



SEARCH MODULE AND 
RESULTS ANALYZER 



132-^ 



130 



DOCUMENT 
STORAGE 



108 



112 



PERSONAL DISKS 



114- 



ONLINE DATABASE 



116- 



yi46 



SEARCH ENGINE 



118-^ 



COPYRIGHT 
CLEARINGHOUSE 



SERVICE MANAGER 



SUMMARIZER 



142 



RESULT OUTPUT MODULE 



134 



136 



A 



138 



SIMILAR 
DOCUMENT 
LOCATOR 



144 



DIGITAL 
RIGHTS 
MANAGER 

^140 



2^. 



TRANSLATOR 



L 1 



110 



OUTPUT DEVICES 



□ 




FIG. 1 



input y 

DOCUMENT 



2/8 



202 



206 




Perform OCR To Identify Text 
In The Input Document Image(s) 



208 



Parse Input Document For Text 
And Perform OCR On Any 

Embedded Image(s) To Identify 
Text In Embedded Image(s) 



Identify Best Keywords In Tokenized Text 
(See FIG. 3) 



210 



Develop Query Using Best Keywords And Search 
For Similar Documents Using Developed Query; 
Analyze Results And Repeat If Necessary 
(See FIG. 4) 



212 



Results 
Sufficient (Include 
All Results If 218 
.Performed)? 



214 



218 



Exact Match Document 
Assigned To Be The Input 
Document To Identify 
Additional Documents 



If (Last) Input 
Document OCRed Or Input 
Document Is A Partial Document, 
Was There An Exact Match Detected But 
Few Additional Documents 
Identified? 



216 



220 



Apply Services And 
Summarize Search Results 



222 



Deliver Summary And Token Representation Of 
Identified Document(s) Including Any Results 
Of Services Performed 



FIG. 2 



3/8 

206/208 



210 



Tokenize Extracted Text To Define A List Of Keywords 



302 



Normalize List Of Keywords 



304 



Initialize Weight Of All Keywords In List Of 
Keywords To A Predefined Value (e.g., W TD =-1) 



Delete All Keywords In The List Of 
Keywords That Are Identified As Stop Words 



306 



308 



31 0\ 



For All Keywords In The List Of Keywords: 

(A) Identify Keywords In DS Dictionary Of Words And The Phrases In Which They Are Used; 

(B) Identify Combinations Of Keywords In The List That Satisfy The Longest Phrase; 

(C) Determine The Frequency Of Occurrence Of Keywords And Phrases In 
The Document F TD ; 

(D) Set Linguistic Frequency Of Occurrence Of Keywords And Phrases To Predefined 
Small Value (e.g., F T =1) 



For All Keywords In The List Of Keywords: 

(A) For Each Keyword In The List Identified In The DS Dictionary Of Words, Override 
Linguistic Frequency (F T =1) With Value Found In The Linguistic Frequencies Of Keywords; 

(B) For All Other Keywords Lookup In Database Of Linguistic Frequencies F T For Keyword 
(If One Exists); 

(C) Limit Number Of Occurrences Of Keywords To Maximum (e.g., If F TD >2 Then F TD =2) 

(D) If Fj p And F T Assigned To Keyword In The List, Compute Weight W T D 



^7 



For All Keywords In The List Of Keywords That Do Not Exist In The 
Linguistic Frequencies Or In The DS Dictionary Of Words (e.g., W T D =-1 ): 
If Keyword Is Regular Expression Then Assign A Weight Of 1 (e.g., W TD =1) 
Else Cache And Remove Keyword From The List 



312 \ 




FIG. 3 



4/8 

210 



212 



402 



Define A List Of N (e.g., N=5) Best Keywords With The Greatest Weight And 
With A Maximum Of One Keyword That Was Only A DS Dictionary Keyword; 
Set A Keyword Threshold Weight To Lowest Weight In The List Of Best Keywords 



404 



Replace N (e.g., N=5) Keywords In The List Of Best Keywords With 

A Weight Lower Than The Keyword Threshold Weight And With 
A Maximum Of One Keyword That Was Only A DS Dictionary Keyword 
(Which May Have A Weight Greater Than The Keyword Threshold Weight); 
Adjust Keyword Threshold Weight To Lowest Weight In List Of Best Keywords 



418 



Develop Query Using List Of Best Keywords 



Perform Query And Assemble Search Results 



Compute Weights For An Extracted List Of Keywords 
From Assembled Search Results As Performed At 
210 {See FIG. 3) For The Input Document 



Compute Distance Measurement Between Input 
Document And Each Document In Search Results 
(See FIG. 5 And FIG. 6) 



406 



408 



410 



412 



Perform Query 
Reduction By 
Removing One 
Keyword In List Of 

Best Keywords 
That Has Smallest 
Weight And Was Not 
Only A DS Dictionary 
Keyword 




FIG. 4 



5/8 



o 
o 



Q 

T3 



Q 

CO 

B 

cd 



o 
o 
-d 

d 

CD 
CD 



CD 



CD 

cd 
B 

o 

B 

o 



H 

2 



i 

u 



d 

CD 

I 



a 

p 



o 



3 



d 

cd 



3 

cd 
o 
-a 

o o 



cd 



-d 

o 

cd 



CN 

Q 



cd 

•4— ' 

cd 

'§ 



CO 

o 
cd 
o o 

-t— > 

co co 



00 



Q Q 



d 

P 



a 

B cza 

o 



CN 

Q 

d 

CD 
Ph 

& 

o 
d 

o 

T3 
h-j 

cd 

-d 



Q 

.d 

1/3 

-d 



cd 



oj 
8 

-d 

1 

cd 

CO 



CN ^ 

°3 

.9 d 
fa * rt 

cd tZ) 

cd *d 

Oh 

Ph ° 
cT £ 

O >> 

M CD 

Bo 

_5l 
a.s I 

d co CD 
T3 -d 



co 



i-H 

o 



2 bb 

* J* 
o o 

IB 

CD 



6 if 



d 



o 
d 



CD 



3 

CO 

CD 



co -i-> 

-a -d 

cd 
-t— i 

d 

o 
o 

<u 
-d 
H 



* § a 



cd d 

•d 5/3 

-d § 

cd ca 

<-> CO 

~ CD 



CD 

£ O 

n c« 
"d 

CO rC 

a> E> -d 

£ .22 
-d ^3 

» d !h 

-d o 

o E I 

d M 

S ^ O 



J-1 



II eg 

CN d 



(-H 

a 



d s 



d 

o 

5 ^ 

-d 

-i— > 

5-H 

o 

V 



I 



if 
- « a 

TO rrt -i-H 

O -d cd 

+-> ^ 5h 



Ph 
-t-> 

o 
d 

b .a 

^ c3 

CN <D 

Q Ph 
M Ph 

d cT 

■3 5 

£ B 

CD 

. . M <N 

^ «4H Q 

P o « 

^ co rt 

O CO 

4= 



55 ^ 



00 



^ <p <u 



d «+H 

o o 



^ E 



co 



1) 



d oo 



0> 



^ 11 z, 

■ - B 

£ B B 

u B ™ 

B^| 
I 11 

(S (« IT) 

° «« a 

CN ^ 

a 

CD 



d 

CD 



d 

CD 
O 
T3 

CD 

J"H 

o 



•d <s 
d &o 



CD 

co 



CD 
+-> 

CD 
CD 

-4-> 

CD 
T3 



6 

CD 

u 
r 1 o 

~ d 
.2 ^ 

cd r 

13 

CD > 
CD CD 

I i 

« o 



CD ^ 

d^ ^ 

<U CD 

+- 1 rd 

cd d3 



cd 



CD 



d 

CD fd 



d 

CD 

a 

d 

CD 
O 

Ph 

o 

CD 

s- 
cd 
-d 

cd 

d 



Cti CN 

y a 



d 



CD 



O 

rH 
CO 

CD 

B 



a 

-d i 

CN 

£ 



d 

cd 



co 

d 



d 



o 

CD * 
cd ^ 

5b II ii 
'§ ^ H 

5 

C+H 



<N 

Q 

.a 

CD 
cd 

B 

i — i 
Q 

a 

o 

C+H 

co 

o 



CD 
44 

C+H 

O 

CD 
b0 
cd 

d 

CD 
CD 

CD 
Ph 

CD 



3 H 

.2 + 

-4— > 

Sh CN 

d a 

2 ^ 
•£ oo 

•a a 

CD d 

3 H 

Cd KH 

d 
o 

B 

U 



6/8 



CN 

Q 
.g 

J "SB 



o & 



CO 

-a 

1 <* a 
S 1 ° £ 

4^ 2 o 

43 &0 4=! 

^ -rt M 

"s Si 



&0 



CO 



<D 

+^ CN 



-*-> *- 

a> ' — i 

^ Q 

OO 

O 3 

•S3 co. 

fa O 



c3 

a 
o 



1 — 1 *»H -h hh 



co 

s 3 ii 

o WW 
4=1 



s ™ 

A 
t— ( 

I— I 

V 



a ^ 



-d 

Sh 

<d 

S-h 

o 

<D 

o 



a 

=3 
CO 



A 
(N 



o x 
O 
in 

A 

CN 
i— ( 

cd 



=3 
O 

>» II 
o CO 

CO 

4=1 w 

«J || 

8 CO 



<D 

-l-i s 

5=1 ^ 

Mg 
•p CN 

b V 



CD 

CO 



S3 «2 

2 a 

co 3 
^ CO 

>> + 

-o 'g "J 



CN 

Q 



eft 

o 



0 s - 

O 

os 
A 
CO 



<D 
M 

o 

CO 



.-■43 

"C "ft ^ 

5 « £ 



K3 
CD 
CD 



CD 



<U CD 
T3 4=j 

42 

>> 

a 



Q 

5=1 
* i— i 

CO 
i-H 

o 

4D 

CD 
5=1 
CD 



? a 



<4— i 

o 

I 

co 



CO 



CD 



c 
E 

=5 

CD 

o 

T3 



CN 



a5 



=3 

CO 

+ 

- a a 



CO 

CO 

"53 



CD 



^co 

— ! II 

l 

a 

.... =3 
<3 CO 



c 

o 

4^ - , 
^3 43 >> 



^ <N <4H ^ 



i— I .T3 <d 



a 

=3 



a 

=3 
CO 



<g o a 



C/3 



5=1 
O 



CO W 

cd 53 O 

* -1 <-! ii 

K3 



OO 

a 

=3 
CO 

+ 

a 

=3 
CO 

+ 

•a a 

o3 =3 

co 

CO / — s 



O II 



53 

I 

CD 

o 
o 

CD 

a 

43 



CO 



CO 

5= 
<D 
4= 



,2 



O 

CN 

+ — i A 

<« ^ ^ 
0) © w 

tS IT) <+h 

S A ' H 

.a cs 

bfl CO 

o fa 



CO 

CO 
0> 

CO 

"o> 



CO 



CD 



CN 



<D 



u 



o 



<D 

CO 

13 



CO 



o 

A 
CO 



O OO 



=5 
CO 



CO 

3 

U 



CO w 

A --h 

CN 
CO 



CO 

CD CO 

™ II 
CD 11 

CO 

<D 
co 

13 



7/8 



* T"H 

t/5 

o 

.bp 
'53 



+ 

« ^h' 

j .a 



t/3 




1/5 



-t-» rt 



u 



+ 

I 

-J 

.9 
-(-> 

i 

+j 
o 

op 
'53 



J3 



3 

u 

<D 

S-H 

o 

-t— > 

<D 

u 
o 



8/8 




*° °° S>- SP> S> & 

wwww / \ 



1 S fe"~C3 >- OS 

' - § US | 



CO 



__ CD 

o o 



X =3 Q CD CD 

lu E co E 

< ^ 



a 






_l 




< 




^ CO 


o 


is; 


CO 


CO 


QC Q 







i 






LU 




EAR 


CD 


CO 


LU 


1 


k 




